Day 4 - Introduction to Data Analysis with R
Freie Universität Berlin - Theoretical Ecology
October 17, 2023
Schedule of today
Now - 14 (or 14.30 if you are enthusiastic still): Work on the data set(s)
14 (14.30) - 15: Short feedback round
15-16: Feedback, conclusion
corrplot package for correlation plotsfactoextra package for PCA visualizationNA values: use tidyr::drop_na() to remove all NA values from the data firstPhysicochemical properties of wine and quality judgements
'data.frame': 1599 obs. of 12 variables:
$ fixed.acidity : num 12.7 9.8 6.5 8.6 7.5 7.6 10.1 6.4 6.1 6.7 ...
$ volatile.acidity : num 0.6 0.66 0.88 0.52 0.58 0.5 0.935 0.4 0.58 0.46 ...
$ citric.acid : num 0.49 0.39 0.03 0.38 0.14 0.29 0.22 NA 0.23 0.24 ...
$ residual.sugar : num 2.8 3.2 NA 1.5 2.2 2.3 3.4 1.6 2.5 1.7 ...
$ chlorides : num 0.075 0.083 0.079 0.096 0.077 NA 0.105 0.066 0.044 0.077 ...
$ free.sulfur.dioxide : num 5 21 23 5 27 5 11 5 16 18 ...
$ total.sulfur.dioxide: num NA 59 47 18 60 NA 86 12 70 34 ...
$ density : num 0.999 0.999 0.996 NA 0.996 ...
$ pH : num 3.14 3.37 NA 3.2 3.28 3.32 3.43 3.34 3.46 3.39 ...
$ sulphates : num 0.57 0.71 0.5 0.52 0.59 NA 0.64 NA NA 0.6 ...
$ alcohol : num 11.4 11.5 11.2 9.4 9.8 11.5 11.3 9.2 12.5 10.6 ...
$ quality : int 5 7 4 5 5 6 4 5 6 6 ...
dplyrdplyr::mutate and as.factor() to tranform the columnjanitor::clean_names() functionMost important variables:
| variable | class | description |
|---|---|---|
| gender | character | Binary gender |
| event | character | Event name |
| medal | character | Medal type |
| athlete | character | Athlete name (LAST NAME first name |
| abb | character | Country abbreviation |
| country | character | Country name |
| type | character | Type of sport |
| year | double | year of games |
Get the data:
dplyr!is.na(medal))Atlantic marsh fiddler crab (Minuca pugnax)
# A tibble: 6 × 9
date latitude site size air_temp air_temp_sd water_temp water_temp_sd
<date> <dbl> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 2016-07-24 30 GTM 12.4 21.8 6.39 24.5 6.12
2 2016-07-24 30 GTM 14.2 21.8 6.39 24.5 6.12
3 2016-07-24 30 GTM 14.5 21.8 6.39 24.5 6.12
4 2016-07-24 30 GTM 12.9 21.8 6.39 24.5 6.12
5 2016-07-24 30 GTM 12.4 21.8 6.39 24.5 6.12
6 2016-07-24 30 GTM 13.0 21.8 6.39 24.5 6.12
# ℹ 1 more variable: name <chr>
Ideas - known methods
Temperature and ice duration on lakes since 19th century
Ice data:
# A tibble: 6 × 5
lakeid ice_on ice_off ice_duration year
<fct> <date> <date> <dbl> <dbl>
1 Lake Mendota NA 1853-04-05 NA 1852
2 Lake Mendota 1853-12-27 NA NA 1853
3 Lake Mendota 1855-12-18 1856-04-14 118 1855
4 Lake Mendota 1856-12-06 1857-05-06 151 1856
5 Lake Mendota 1857-11-25 1858-03-26 121 1857
6 Lake Mendota 1858-12-08 1859-03-14 96 1858
Temperature data:
# A tibble: 6 × 3
sampledate year ave_air_temp_adjusted
<date> <dbl> <dbl>
1 1870-06-05 1870 20
2 1870-06-06 1870 18.3
3 1870-06-07 1870 17.5
4 1870-06-09 1870 13.3
5 1870-06-10 1870 13.9
6 1870-06-11 1870 15
Ideas - known methods
left_join to combined the tables with annual mean temperature and ice durationData from FU et al. 2015, Nature Cell Biology
Data found via Tutorial on heat maps using this data
3 csv files:
heatmap_genes.csv: A list of the names of interesting genes to look at (Genes used in Figure 6b in paper)DE_results.csv: Gene expression in luminal cells in pregnant versus lactating mice
normalized_counts: Normalized counts for genes for the different samplesData cleaning:
janitor::clean_names function to make the column headers nicerDE_results and normalized_counts by their shared columnsselect to remove columns you don’t need for analysis to get a better overviewp_value < 0.01 & abs(logFC) > 0.58)Data analysis:
pheatmap::pheatmap()
pheatmap takes a matrix as input (use as_matrix on tibble to transform)scale function
pheatmap can scale but with ggplot you have to scale before plottingWorking with real research data
Meet in your group (if you want)
Work on your data set
Take breaks as you need and be back at 2 p.m.
Keep an eye on your group and the general chat
In 1-2 mins:
What was the highlight of your analysis?
What was difficult?
If you want: Share a screenshot in the chat or share your screen
Please take 10 mins to complete the feedback survey for the Graduate center (don’t use Internet Explorer)
We learned a lot of stuff!
Selina Baldauf // Bring your own data